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EXTENSIBLE FRAMEWORK FOR HANDLING DIFFERENT MARK UP 
LANGUAGE PARSERS AND GENERATORS IN A COMPUTING DEVICE 



FIELD OF THE INVENTION 

5 This invention relates to an extensible framework for handling different mark up 
language parsers and generators. The invention finds application in a computing device. 
Its advantages are especially valuable for a resource constrained mobile computing 
device, i.e. a battery powered, portable device in which there are power and memory 
constraints, but are relevant to otiier kinds of computiung devices, such as desktop PCs, 
10 set top boxes etc. 



DESCRIPTION OF THE PRIOR ART 

Mark-up language is a set of codes in a text or binary file that enable a computing device 
to format tiic text for correct display or printing. A client (i.e. any process that requests a 
15 service from another process) in a software system creates mark-up language using a 
'generator*. It reads and interprets mark-up language using a 'parser* . 

In die prior art, parsers and generators have been specific to certain kinds of mark-up 
languages. For example, a client could use an XML (extensible mark-up language) parser 
to interpret and handle XML files; it could use a separate WBXML (WAP binary XML) 
20 parser to interpret and handle WBML files. In recent years, tiiere has been a proliferation 
of different mark-up languages: the conventional approach has been to deploy in device 
ROM several separate fully functioning parsers for each mark up language tihat die device . 
needs to parse. This inevitably takes up valuable ROM space; as there may need to be an 
instantiation of each parser at the same time, valuable RAM may also be occupied. 

25 Furdiermore, in die prior art, the client talks dimtif to the XML parser and die separate 
WBXML parser. Also, when the client needs to generate mark-up language format files, 
there could be an XML generator and a separate WBXML generator. Again, the client 
would talk directly to each generator. Clients dierefore have had to be hard-coded to 
handle and talk directly with these specific kinds of parsers and generators; in practice. 
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this has meant that clients are either extremely complex if they need to handle several 
different mark up language formats (further increasing the demand on both ROM and 
also RAM memory) or else they are restricted to a single mark-up language format 
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SUMMARY OF THE PRESENT INVENTION 

The present invention is a computing device ptogrammed with an extensible framework 
tiiat accepts one or more mark-up knguage parsers and/ or generators, each implemented 
as plug-ins to the framework, with different plug-ins enabling different kinds of mark up 
5 languages to be handled by the device. 

The framework is an API (which term includes a set of APIs) that enables different types 
of mark-up parsers and generators to be included in the framework by means of the 
parser/generator plug-ins. A plug-in is a replaceable item of executable code that 
provides specific services to a loosely coupled application that can load or invoke it at 
10 run-time; it can therefore extend die framework at run-time (i.e. there is no need to 
recompile or change the framework for a plug-in to work). 

This approach has many advantages over the conventional approach of hard-coding 
clients to specific parsers and generators. Because of the extensible plug-in design, it is 

15 possible to allow new kinds of parsers and generators to be loaded onto a device after 
that device has- been shipped to an end-user. The only requirement is that they are 
implemented as plug-ins that are compatible with the extensible framework. This is 
especially useful in the context of mark up language parsers and generators since there 
are many potential languages that might need to be handled by a device but it is 

20 impractical to hard-code the capability to handle all of these when die device is designed 
because of the memory overhead. 

Hence, a core further technical advantage offered by the present invention is that it 
reduces memory requirements; this in turn can lead to faster loading of code and/ or less 

25 use of virtual memory. These advantages are especially useful in mobile computing 
devices, where techniques that reduce power consumption and extend battery life are 
very valuable. The term ^mobile computing device' should be expansively construed to 
cover mobile telephones, smartphones, personal organisers, .wireless information device 
and any other kind of portable, mobile computing device. But these advantages are also 

30 valuable in fixed computing devices such as PCs, set top boxes, games consoles etc. and 
can lead directiy to lower BOM (bill of materia^ costs because of the lower memory 
requirements. 
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Also, since new mark up languages (or extensions/variants to existing languages, 
including new schemas) are frequently developed, the ability to extend an existing design 
of computing device to handle these new or extended languages is very useful. The 
extensible framework may be a stand alone application or may form part of a device 
operating system: if the latter, a particular version of the operating system that includes 
the extensible framework can be developed and included in device ROM for a broad 
range of computing devices: this version will be able to handle different kinds of mark 
up languages when appropriate parser/generator plug ins are used. 

In one implementation, the extensible framework (i) insulates a client running on the 
device from having to communicate direcdy with a parser or generator and (ii) is generic 
in that it presents a common API to the client irrespective of the specific kind of parser 
or generator deployed. Ah advantage of this is that it allows not only different parsers 
and generators to be readily used by the same client, but it allows also several different 
clients to share the same parsers and generators as well. In addition, clients can be far 
simpler than prior art designs that could handle several different mark-up language 
parsers or generators: this leads to smaller memory requirements in both ROM and 
RAM 

Because the framework (e.g. the API) is extensible, extensions to its capabilities (e.g, to 
enable a new/extended mark-up language of a document to be handled) can be made 
without affecting compatibility with existing cUents or existing parsers and gelierators. 
This may be achieved dirough an updated/extended namespace plug-in; tiiis plug-in sets- 
up all tiie elements, attributes and attribute values for a namespace. Similarly, new kinds 
of clients can be provided for loading onto a device after thiat device has been shipped to 
an end-user. The only requirement is that they are compatible with the intermediary layer. 

The API is typically implemented as a header file. 

The specific kind of parser or generator being used is not known to the client: the 
intermediary layer frdly insulates die client from needing to be aware of these specifics. 
Instead, the client deals only witia the intermediary layer, which presents to die client as a 
generic parser or a generic generator - i.e. a parser or generator which behaves in a way 
tiiat is common to all parsers or generators. 
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For example, the SyncML the protocol supports both XML and WBXML. By using both 
XML and WBXML parser and generator plug-ins in to the framework, a SyncML client 
can use either or both type of parser and generator widiout knowing about the type of 
« mark-up language; as a result, die design of the SyncML client is gready simplified. Since 
5 WBXML and XML are quite different in the way they represent their data, one very 
useful feature of the invention is the mapping of WBXML tokens to a string in a static 
string pool table. Appendix C expands on this idea. 

The present invention may provide a flexible and extensible file conversion system: for 
example, the device could parse a document written in one mark up language format and 
10 ' then use the parsed document data to generate an equivalent document in a different file 
format. 

Another feature of the present invention is that the mark-up language parser or generator 
may access components to validate, pre-filter or alter data, in which the components are 
plug-in components to the extensible framework that operate using a 'chain of 
15 responsibility* design pattern. They may be plug-ins to the extensible framework 
described above. 

Because of tiie plug-in design of the these components, die system is inherentiy flexible 
and extensible compared with prior art systems in which a component (for validating, 
pre-filtering or altering data from a parser or generator) would be tied exclusively to a 

20 given parser. Hence, if a mark up language of a document is extended, or a new one 
created, it is possible to write any new validation/pre-filter/ altering plug-in that is needed 
to work with the extended, or new language. These new kinds of validation/pre- 
filter/altering plug-ins can be provided for loading onto a device even after that device 
has been shipped to an end-user. Further, any of these plug-ins will work with any 

25 existing parser or generator that is itself a plug-in to the extensible framework, i.e. uses 
die same generic API. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention will be described with reference to the acompanyuing drawings, in 
which: 

Figure 1 is a schematic block diagram of an extensible framework for handling mark up 
languages; a parser, generator, client and four furtiier plug-ins are shown; 

Figure 2 is a schematic block diagram of a client parsing using a DTD validator and 
auto-corrector; 

Figure 3 is a schematic block diagram of a client using a generator with a DTD validator 
and auto-corrector; 

Figure 4 is a class diagram for the extensible framework; 

Figure 5 is a class diagram of a WBXML parser used to parse SyncML; 

Figure 6 is a sequence diagram for a parser and generator session; 

Figure 7 is a sequence diagram showing DTD validation and auto-correction; 

Figure 8 shows how WBXML tokens map to strings. 
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DETAILED DESCRIPTION 
Overview of Key Features 

The present invention is implemented in a system called the Mark-Up Language 
Framework, used in SymbianOS from Symbian Limited, London, United Kingdom. 
5 SymbianOS is an operating system for smart phones and advanced mobile telephones, 
and other kinds of portable computing devices. 

The Mark-Up Language framework implements three key features. 

1. . Generic Parser Extensible Framework 

Clients are separated from mark-up language parsers/generators by an extensible 
10 framework that accepts one or more mark-up language parsers and/ or generators, each 
implemented as plug-ins to the framework, with different plug-ins enabling different 
kinds of mark up languages to be handled by the device. The extensible framework is in 
effect an intermediary ^.e. abstraction) layer that (a) insulates the client from having to 
communicate direcdy with the parser or generator and is (b) generic in that it presents a . 
15 . common API to the client irrespective of the specific kind of parser or generator the 
intermediary layer interfaces with. 

2. Data validation/pre-£iltering and altering components in a chain of 
responsibility 

Mark-up language parser or generator plug-ins to the extensible framework can access 
20 components to validate, pre-filter or alter data; the components are plug-in components 
to the extensible framework that operate using a 'chain of responsibility' design pattern. 

3. Generic Data Supplier API 

The mark-up language parsers or generators can access data from a source using a 
generic data supplier API, insulating the parser or generator from having to 
25 communicate direcdy with die data somce. 

Each of this features will now be discussed in more detail. 

1. Generic Parser Extensible Framework 
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The essence of this approach is that the parsers and generators are plug-ins to an 
extensible firamework; the framework is in one implementation part of the operating 
system of the device. The present invention may hence readily allow the device to 
operate with different kinds of parsers and generators: this extensibility is impossible to 
achieve with prior art hard-coded systems. 

The client interfaces with a. mark up language parser or a generator via the extensible 
framework intermediary layer: this layer (a) insulates the client from having to 
commimicate direcdy with the parser or generator and is (b) generic in that it presents a 
common API to the client irrespective of the specific kind of parser or generator the 
intermediary layer interfaces with. 

In this way, the client is no longer tied to a single kind of parser or generator; it can 
operate with any different kind of parser compatible with the intermediary layer, yet it 
remains far simpler tiian prior art clients tiiat are hard-coded to operate directiy with 
several different kinds of parsers and generators. 

The API is typically implemented as a header file. The specific kind of parser or 
generator being used is not known to the dient: die intermediary layer fiiUy insulates the 
client from needing to be aware of these specifics. Instead, the client deals only witii the 
intermediary layer, which presents to the client as a generic parser or a generic generator 
— i.e. a parser or generator which behaves in a way that is common to all parsers or 
generators. 

For example, the SyncML die protocol supports both XML and WBXML. By using both 
XML and WBXML parser and generator plug-ins in to the framework, a SyncML client 
can use eitfier or both type of parser and generator without knowing about die type of 
mark-up language; as a result, die design of die SyncML dient is greatiy simplified. Since 
WBXML and XML are quite different in the way they represent their data, one very 
useful feature of die invention is die mapping of WBXML tokens to a string in a static 
stnng pool table. Appendix C expands on this idea. 

The present invention may provide a flexible and extensible file conversion system: for 
example, the device could parse a document written in one mark up language format and 
then use the parsed document data to generate an equivalent document in a different file 
format. Because of the extensible plug-in design of an implementation of the system, it 
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is possible to provide far greater kinds of file conversion capabilities than was previously 
the case. New kinds of parsers and generators can be provided for loading onto a device 
after that device has been shipped to an end-user. The only requirement is that they are 
compatible with the intermediary layer, 

5 

Another advantage of the present invention is that it allows not only different parsers 
and generators to be readily used by the same client, but it allows also several different 
clients to share the same parsers and generators as well. The API may itself be 
extensible, so that extensions to its capabilities (e.g. to enable a new/ extended mark-up 
10 language of a document to be handled) can be made without affecting compatibility with 
existing clients or existing parsers and generators. Similarly, new kinds of clients can be 
provided for loading onto a device after that device has been shipped to an end-user. 
The only requirement is tiiat tiiey are compatible with the intermediary layer. 

15 2. Data validation/pre-filtering and altering components in a chain of 
responsibility 

The essence of this approach is that die mark-up language parser or generator can access 
components to validate, pre-filter or alter data, in which the components are plug-in 
components that operate using a chain of responsibility. They may be plug-ins to the 
20 extensible framework described above. 

Because of die plug-in design of die components, the system is inherendy flexible and 
extensible compared witii prior art systems in which a component (for validating, pre- 
filtering or altering data from a parser or generator) would be tied exclusively to a given 
parser. Hence, if a mark up language of a document is extended, or a new one created, it 
25 is possible to write any new validation/pre-filter/altering plug-in that is needed to work 
with die extended or new language. These new kinds of validation/pre-filter/altering 
plug-ins can be provided for loading onto a device even after that device has been 
shipped to an end-user. The ^chain of responsibility' design pattern, whilst known in 
object oriented programming, has not previsouly been used in the present context. 

30 The plug-in components may all present a common, generic API to the parser and 
generator. Hence, the same plug-in can be used widi different types of parsers and 
generators (e.g. a XML parser, a WBXML parser, a RTF parser etc.). The plug-ins also 
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present a common, generic API to a client component using the parser or generator. 
Hence, the same plug-ins can be used by different clients- 

For example a DTD validator plug-in could be written that validates the mark-up of a 
document and can report errors to the client. Or for a web browser an auto correction 
5 plug-in filter covdd be written that tries to correct errors found in the mark-up language, 
such as a missing end element tag, or a incorrecdy placed element tag. The auto 
correction plug-in will, if it can, fix tiae error transparentiy to the client. This enables a 
web browser to still display a document ratiier then just displaying an error reporting that 
tiiere was an error in the document 

10 

Because the plug-ins can be chained togetiier, complex and different type of filtering and 
validation can take place. In the example above the parser could notify the validator plug- 
in of elements it is parsing and tiiese in turn would go to the auto correction plug-in to 
be fixed if required and finally die client would receive tiiese events. 

15 

The mark-up framework allows parser plug-ins to expose the parsed element stack to all 
validation/pre-filter/altering plug-ins. (The parsed element stack is a stack populated 
with elements from a document extracted as that document is parsed; tiiis stack is made 
available to all vaHdation/pre-filter/ altering plug-ins to avoid the need to duplicate the 
20 stack for each of these plug-ins). This also enables the plug-ins to use the stack 
information to aid in validation and filtering. For example an auto corrector plug-in may 
need to know the entire element list that is on the stack in order to figure out how to fix 
a problem. 

25 The use of filter/ validator plug-ins in mark-up language generators is especially useful for 
developers writing a client to die framework and generating mark-up documents as die 
same validator plug-in used by the parser can be used in the generator. Errors are 
reported to the client when the mark-up does not conform to the validator which will 
enable the developer to make sure they are writing well formed mark-up that conforms 

30 to die DTD and catch error early on during development 

The mark-up framework incorporates a character conversion module that enables 
documents written in different character sets (e,g, ASCII, various Kanji character sets 
etc.) to be parsed and converted to UTF8. This means a client obtains die results fi:om 
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the parser in a generic way (UTF8) without having to know the original character set that 
was used in the document. Clients hence no longer need to be able to differentiate 
between different character sets and handle the different character sets appropriately. 

5 3, Generic Data Supplier API 

The mark-up language parser or generator accesses data from a source using the 
extensible framework — i.e. a generic data supplier API. Hence, the parser or generator is 
• insulated from having to talk directiy to a data source; instead, it does so via the generic 
data supplier API, acting as an intermediary layer. This de-couples the parser or 
10 generator from the data source and hence means that the parser or generator no longer 
have to be hard coded for a specific data supplier. This in turn leads to a simplification 
of the parser and generator design. 

The present invention allows parsing and generation to be carried out with any data 
source. For example, a buffer in memory could be used, as could a file, as could 
15 streaming from a socket (hence enabling the ability to parse in real-time from data 
streamed over the internet). There is no requirement to define, at parser/generator build 
time, what particular data source will be used. Instead, die system allows any source that 
can use the generic data suppUer API to be adopted. New types of data sources can be 
utilised by computing device, even after tiiose devices have been shipped to end-users. 

20 

The present invention is implemented in a system called the Mark-Up Language 
Framework, used in SymbianOS from Symbian Limited, London, United Kingdom. 
SymbianOS is an operating system for smart phones and advanced mobile telephones, 
and other kinds of portable computing devices. 

25 

The following describes the Mark-Up Language Framework in more detail. Appendix C 
describes a particular technique, referred to as 'String Pool', which is used in die Mark- 
Up Language Framework. Various SymbianOS specific programming techniques and 
structures are referred to. There is an extensive published literature describing tiiese 
30 techniques; reference may for example be made to *Trofessional Symbian Programming" 
Wrox Press Inc. ISBN: 186100303X, the contents of which are incorporated by 
reference. 
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Design Overview 
Block Diagrams 

The mark-up language extensible framework is shown schematically in Figure 1. This is 
implemented as part of the operating system of a computing device. The Client 1 is the 
application using the mark-up framework for parsing or generating a document. The 
Parser 2 and Generator 3 components are plug-ins specific to a mark-up language (e.g. 
XML or WBXML); they are plug-ins to the extensible framework — i.e. a set of generic 
APIs that enable the client 1 to communicate with Parser 2 and Generator 3. The 
plug-ins conform to Symbian OS requirements known as 'ECOM'. 

Because of die framework architecture, many new kinds of parsers and generators (e.g. to 
handle extensions to mark up languages, new languages or new schemas) can readily be 
loaded onto the device, even after the device has shipped. Further, different clients 
running on the device can share the same parser or generator; these clients are simpler 
dian prior art clients since diey need to operate with a single, generic API, The API is 
shown symbolically as the API abstraction or intermediary layer 10. The Parser 2 and 
Generator 3 components use the Namespace collection 4 to retrieve information 
about a specific namespace during the parsing or generating phase. 

The Namespace Plug-in 5 component is an ECOM plug-in that sets-up all the 
elements, attributes and attribute values for a namespace. For each namespace used 
,there must be a plug-in that describes the namespace. The namespace information is 
stored in a string pool. The string pool is a way of storing strings that makes comparison 
almost instantaneous at the expense of string creation. It is particularly efficient at 
handling string constants that are known at compile time, which makes it very suitable 
for processing documents. Appendix C includes more detail on string pools. The 
Namespace collection 4 owns the string pool that the Parser 2, Generator 3 and 
Client 1 can gain access to. 

The Namespace Plug-in 5 simply sets-up the string pool with the required strings for 
die namespace the plug-in represents. The Client 1 may get access to the Namespace 
Collection 4 via the Parser 2 or Generator 3 to pre-load namespaces prior to parsing or 
generating documents which may speed up the parsing or generating session. 
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The Plvig-in components (5A - 5D) ate optional and allow further processing of the data 
before the client receives it, such as DTD validators or document auto correctors. 
Validators check the elements and attributes conform to the DTD. Document auto 
5 correction plug-ins are used to try to correct errors reported from DTD validators. These 
components are also plug-ins to the extensible framework and hence share at least some 
of the same APIs 10 as the Parser 2 and Generator 3, The Parser 2 is event driven and 
sends events to die various plug-ins and UI during parsing. 

10 Figure 2 shows a Client 21 parsing with a DTD validator 22 and Auto corrector 23; 
these components are also plug-ins to the extensible framework and use the same generic 
interface, again indicated schematically as layer 10. As a consequence, tiiese components 
wiU operate witii any parser or generator that that is a plug-in the extensible framework. 
In operation, the Client 21 talks to the Parser 24 directiy to start die parse. The Parser 

15 24 sends events to the Plug-ins, 22, 23; they operate using a *chain of responsibility*. 
The first plug-in that receives events is die DTD validator plug-in 22. This plug-in 
validates tiiat tiie data in the event it received is correct If it is not correct it will send die 
same event the Parser 24 sent to die Validator 22 to die Auto corrector 23 except for a 
error code that will describe the problem die Validator 22 encountered. It die event data 

20 is valid the same event will be sent to the Auto corrector 23, Now die Auto corrector 
23 receives die event and can check for any errors. If diere is an error it can attempt to 
correct it. If it can correct the error it will modify the data in die event and remove die 
error code befoire sending die event to die client. The Client 21 finally receives die event 
and can now handle it. 

25 

Figure 3 illustrates a Client 31 generating mark up language using a generator 34 with a 
DTD validator 32 and Auto corrector plug-in 33. All plug-ins are plug-ins to the 
extensible framework and hence share at least some of the same APIs, again symbolically 
shown as API Layer 10, A real client would probably never use a generator and auto 
30 corrector since the data the client generates should always be valid, but it is used here to 
show die flow of events from a generator and any plug-ins attached. 

The Client 31 sends a build request to the Generator 34. The first tiling die Generator 
34 does is to send the request as an event to the DTD validator plug-in 32. The 
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situation is similar to the parser: the DTD validator plug-in 32 validates that the data in 
the event it received is correct. If it is not correct, it will send the same event the 
Generator 34 sent to die Validator 32 to die Auto corrector 33 except for an error 
code that will describe the problem die Validator 32 encountered. It the event data is 
5 valid the same event will be sent to the Auto corrector 33. Now the Auto corrector 33 
receives the event and can check for any errors. If there is an error it can attempt to 
correct it. If it can correct the error it will modify the data in the event and remove the 
error code before sending the event back to the Generator 34. The major difference 
between the events during parsing and generating is that with generating, once the final 
10 plug-in has dealt with the event it gets sent back to the generator. The generator receives 
the event and builds up part of the document using die details from the event. 

Parsing and Generating WBXML 

Parsing WBXML is quite different to parsing Xl^ or HTML. The main difference is 
15 elements and attributes are defined as tokens radier dian using their text representation. 
This means a mapping needs to be stored between a WBXML token and its static string 
representation. The Namespace plug-in for a particular namespace will store these 
mappings. A WBXML parser and generator can then obtain a string from the 
namespace plug-in given die WBXML token and vice versa. Appendix C deals with 
20 diis in more detail. 

Class Diagram 

The class diagram for the mark-up framework is shown in Figure 4. The diagram also 
depicts plug-ins that makes use of the framework. The dark grey classes are the plug-ins 

25 that provide implementation to the mark-up framework. CxmlParser 42 and 
CwbxmlParser 43 provide an implementation to parse XML and WBXML documents 
respectively. In the same way CxmlGenerator 44 and CwbxmlGenerator 45 generate 
XML and WBXML documents respectively. Cvalidator 47 is a plug-in which will 
validate the mark-up document during parsing or generating, CautoCorrector 46 is a 

30 plug-in that corrects invalid mark-up documents. 



When parsing a document and the Client 41 receives events for the start of an element 
for example (OnStartElementL), the element RString in die event is a handle to a 
string in the string pool. If this is a known string, i.e. one diat has been added by the 
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Namespace Plug-in then the string will be static. Otherwise, if it is an unknown string, 
the parser will add the string to the string pool as a dynamic string and return a RString 
with a handle of diis string. It is not possible to know if a RStting is dynamic or static so 
the parser or generator that obtains a RString must be sure to close it to ensure any 
5 memory is released if the string is dynamic. A client that wishes to use the RString after 
the event returns to the parser must make a copy of it which will increase the reference 
count and make sure it is not deleted when the parser closes it. The key to the Figure 4 
shading is as follows: 



NAME 


TYPE OF CLASS 


RNamespaceCollecdon 


Mark-up framework class 


RParserS es sion 


Mark*up ftramework class H 


RGeneratorSession 


Mark-up firamework class 11 


CMarkupPluginBase 


Mark-up firamework class 11 


dMarkuoNamesDace 


Mark-up firamework class | 


RTableCodePae-e 


Mark-up firamework class 


CMarkupCharSetConverter 


Mark-up firamework class 


CMarkuo Plupin 


Mark-up firamework class 1 


CParserSes sion 


Mark-up firamework class 11 


CGeneratorSession 


Mark-up firamework class 11 


RAttribute 


Mark-up firamework class | 


RElementStack 


Mark-up firamework class 


MMarkupCallback 


Mix-in class used for caU-backs 


MDataSupplierReader 


Mix-in class used for call-backs | 


MDataSupplierWriter 


Mix-in class used for call-backs 


CActive 


System class in the Symbian OS 


RAttributeArray 


System class in the Symbian OS 


CNamespace 


Implementation plug-in classes 


CValidator 


Implementation plug-in classes 


CAutoCorrector 


Implementation plug-in classes 


CXMLParser 


Implementation plug-in classes 


CWbxmlParser 


Implementation plug>in classes 
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CXmlGenerator 


Implementation plug-in classes 


CWbxmlGeneirator 


Implementation plug-in classes 


CDescriptorDataSupplier 


Implementation plug-in classes 



Figure 5 is an example class diagram that shows the major classes for parsing WBXML 
SyncML documents. The client creates a CdescriptorDataSuppller 51 tiiat supplies the 
5 data to tiie parser. CwbxmlParser 52 is the class that actually parses the document 
CSyncMLNamespace 53 is the namespace for SyncML that the parser uses to map 
WBXML tokens to strings. All the other classes belong to the mark-up framework. To 
parse a document with different namespaces the only thing tiiat needs to be added is a 
plug-in for each namespace. 

10 

The key to the Figure 5 shading is as follows: 



NAME 


TYPE OF CLASS 


RParserSession 


Mark-up framework class 


CMarkupPluginBase 


Mark-up framework class 


KNamespaceCoUection 


Mark-up framework class 


CParserSession 


Mark-up framework class 


CMarkupNamespace 


Mark-up framework class 


MMarkupCallback 


Mix-in class used for caU-backs 


MDataSupplierReader 


Mix-in class used for call-backs 


CActive 


System class in the Sjmibian OS 


CDescriptorDataSupplier 


Implemientation plug-in classes 


CWbxmlParser 


Implementation plug-in classes 


CSyncMLNamespace 


Implementation plug-in classes 



15 
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Class Dictionaty 









MMatkupCallback 


A call-back that a client must 
implement so that the parser 
can report events back to the 
client during the parsing 
session. 


Inherited by cUents and 
plug-ins. 




Contains axoUection of 
namespaces. Contains 

parsers or generators may use 

collection. 


CParserSession or 
CGeneratorSession. 
Owns an array of 
CMarkupNamespace 

plug-ins. 


CMarkupNamespace 


ECOM interface to 
implement a namespace. 


' Inherited by any 

namespace plug-ins. 


RPatserSession 


Public interface for a client to 
create a parser session. 


Owned by the client. 


RGenetatotSession 


Public interface for a client to 
create a generator session. 


Owned by the client. 


CMarkupChatSetConve 
rter 


Helper function which uses 
CCnvCharactetSetConverter 

generator to do any character 
set conversions or resolving 
MIB Enums or Internet- 
standard names of character 
sets. 


Owned by RParserSession 
and RGeneratorSession; 


CMatkupPluginBase 


Generic interface for any type 
of plug-in. 


Inherited by 
CMarkupPlugin, 
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CParserSession and 
r^CrenerattitSession. 


CMatkupPlugin 


ECOM interface for plug-ins 
to be used by the parser and 
generator. 


Owned by 
CParserSession or 
CGeneratorSession. 


MDataSupplietReader 


Piore virtual interface to be 
implemented by a data 
supplier for reading data. 


Inherited by the client's 
data provider. 


MDataSupplietWtitet 


Pure virtual interface to be 
implemented by a data 
supplier for writing data. 


•Inherited by the client's 
data provider. 


CParserSession 


ECOM interface for parser 
plug-ins. 


Inherited by a concrete 
parser implementation. 


CGeneratorSession 


ECOM interface for generatoi: 
plug-ins. 


Inherited by a concrete 
generator implementation. 


RAtttibute 


Contains the name and value 
of an attribute. 


Used by the parse, 
generator and client. 


The classes below are not part of the framework but illustrate how the firamework 
can be used. 


CValidator 


A DTD, schema or some 
other type of validator. 


Owned by 
RParserSession or 
RGeneratorSession. 


CAutoCorrector 


Used to auto correct invalid 
data. 


Owned by 
RParserSession or 
RGeneratorSession. 


CXmlParser 


An XML parser 
implementation. 


Owned by 
RParserSession. 


CWbxmlParser 


A WBXML parser 
implementation. 


Owned by 
RParserSession. 


CXmlGenerator 


An XML generator 
implementation. 


Owned by 

RGeneratorSession. 


CWbxmlGenerator 


A WBXML generator 


Owned by 

Tfc^ ^ -1 



wo 2005/036390 



19 



PCT/GB2004/004276 





implementation. 


RGeneratotSession. . 


CNamespace . 


A namespace plug-in to use 
with a parser and generator. 


Owned by 

Ivln aiiicsp ac6 ^oiiccuuu. 


RElementStack 


A stack of the currently 
processed elements dviring 
parsing or generating. 


Owned by 
CParsetSession and 
CGetietatorSession. . 



Detailed Design 
5 RParserSession 



The following is the public API for this class: 



Method 


Description 


void OpenL( 


Opens a parser session. 


MDataSuppIierReader& aReader, 


aReader is die data supplier reader to use dvudng 


const TDesC8& 


parsing. 


aMatkupMimeType, 


aMatkupMimeType is die MIME type of die 


const TDesC8& 


parser to open. 


aDocumentMimeType, 


aDocumentMimeType is the MIME type of 


MMarkupCallback& aCallback) 


the document to parse. 




aCallback is a reference to the call-back so the 




parser can report events. 


void OpenL( 


Opens a parser session. . * 


MI)ataSupplierReader& aReader, 


aReadet is the data supplier reader to use during 


const TDesC8& 


parsing. 


aMatkupMimeType, 


aMatkupMimeType is die MIME type of the 


const TDesC8& 


parser to open. 


aDocumentMimeType, 


aDocumentMimeType is the MIME type of 


MMa£kupCa]lback& aCallback, 


the document to parse. 


RMarkupPlugins aPlugins) 


aCallback is a reference to die call-back so the 




parser can report events. 




aPlugins is an array of plug-ins to use with die 
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parser. The first plug-in in the list is the first plug- 
in to be called back firom the parser. The first 
plug-in will then caU-back to die second plug-in 
etc. 


void OpenL( 

MDataSupplierReader& aReadet, 
const TDesC8& 
aMarkupMimeType, 
const TDesC8& 
aDocumentMimeType, 
MMarkupCallback& aCallback, 
RMatkupPlxigins aPlugmsQ, 
RNamespaceCoUection 
arN aiiicspacci^ojuicciioii^ 


Opens a parser session. 

aReader is the data supplier reader to use during 
parsing. 

aMarkupMimeType is the MIME type of the 
parser to ppen. 

aDocumentMimeType is the MIME type of 
the document to parse. 

aCallback is a reference to die call-back so the 
parser can report events. 

aPlugins is an array of plug-ins to use with die 
parser. The first plug-in in the list is die first plug- 
in to be called back firom die parser. The first 
plug-in-will dien caU-back to die second plug-in 
etc. 

aNamespaceCoUection is a handle to a 
previous namespace collection. This is usefiil if a 
generator or anodier parser session has been 
created so that same namespace collection can be 
shared. 




void CloseQ 


Closes the parser session. 


void StartQ 


Start parsing the document. 


void StopO 


Stop parsing die document. 


void Reset( 

MDataSupplierReader& aReader, 
MMarkupCallback& aCallback) 


Resets the parser ready to parse a new document. 
aReader is die data supplier reader to use during 
parsing. 

aCallback is a reference to the call-back so the 
parser can report events. 


Tint SetPatseMode( 
Tint aParseMode) 


Selects one or more parse modes. 
aParseMode is one or more of die following: 



t 
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EConvettTagsToLowetCase - Converts 




elements and attributes to lowercase. This can 




be used for case-insensitive HTML so that a 




tag can be matched to a static string in die 




string pool. 




EErtorOnUnrecognisedTags - Reports an 




error when unrecognised tags are found. 




EReportUnrecognisedTags - Reports 




unrecognised tags. 




. EReportNamespaces — Reports the 




namespace. 




EReportNamespacePrefixes — Reports the 




namespace prefix. 


1 


ESendFuliContentlnOneChunk - Sends all 




content data for an element in one chunk. 




EReportNameSpaceMapping — Reports 




namespace mappings via the 




DoStartPrefixMappingO & 




' DoEndPrefixMappingO methods. 




If this function is not called the default will be: 




ERepottUnrecognisedTags | 




EReportNamespaces 




If the parsing mode is not supported 




EErrNotSupported is returned. 






RGenetatorSession 




The following is the public API for this class: 


1 Method 


Description 


void OpenL( 


Opens a generator session. 
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MDataSupplierWriter& aWriter, 
TUid aMarkupMimeType, 
const TDesC8& 
aDocumentMimeType) 


aWriter is the data supplier writer used to 
generate a document. * 

aMarkupMimeType is die MIME type of the 
generator to open. 

aDocumentMimeType is the MIME type of 
the document to parse. 


void OpenL( 

MDataSupplierWriter& aWiitet, 
TUid aMarkupMimeType, 
const TbesC8& 
aDocumentMimeType, 
RMatkupPlugins aPluginsQ) 


Opens a generator session. 

aWriter is the data suppUer writer used to 

generate a document. 

aMarkupMimeType is the MIME type of the 
generator to open. 

aDocumentMimeType is the MIME type of 
the docimient to parse. 

aPlugins is an array of plug-ins to use with the 
generator. 


void OpenL( 

MDataSupplierWriter& aWriter, 
TUid aMarkupMimeType, 
const TDesC8& 
aDocmnentMimeType, 
RMatkupPlugins aPluginsQ, 
RNamespaceCoUection 
aNamespaceCollection) 


Opens a generator session. 

aWriter is the data supplier writer used to 

generate a docxmient. 

aMarkupMimeType is the MIME type of the 
generator to open. 

aDocumentMimeType is the MIME type of 
the docximent to parse. 

aPlugins is an array of plug-ins to use with the 
generator. 

aNamespaceCoUection is a handle to a 
previous namespace collection. This is useful if a 
generator or another parser session has been 
created so that same namespace collection can be 
shared. 


void CloseH 


Closes die generator session. 


void Reset( 

MDataSupplierWriter& aWriter, 
MMarkupCallbackfic aCallback) 


Resets the generator ready to generate a new 
document. 

aWriter is die data supplier writer used to 

1 _ - 
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generate a document 

aCallback is a reference to the call-back so the 
generator can report, events. 


void BuildStartDocumentL( 

RDocumentParameters 

aDocParam); 


Builds the start of the document 
aDocParam specifies the various parameters of 
J the document In the case of WBXML this would 
state the public ID and string table. 


void BuildEndDocumentLO 


Bxiilds the end of the document 


• void BuildStartElementLC 
RTagInfo& aElement, 
RAttnbuteArray& aAttributes) 


Builds the start element with attributes and 
namespace if specified. 

aElement is a handle to the element's details. 
aAttributes contains the attributes for the 
element 


void BviildEiidEIemeiitL( 
RTagInfo& aElement) 


Builds the end of the element 

aElement is a handle to the element's details. , 


void BiiildContentL( 

const TDesC8& aContentPart) 


Builds part or all of the content Large content 
should be built in chunks. I.e. this function, 
shotdd be called many times for each chunk. 
aBytes is the raw content data. This data must be 
converted to the correct character set by the 
client. 


void BuildPrefixIV[appiagL( 
RString& aPrefbc, 
RString& aUii) 


Builds a prefix — URI namespace for the next 
element to be built This method can be called 
for each namespace that needs to be declared. 
aPrefix is the Namespace prefix being declared, 
aUri is the Namespace URI the prefix is mapped 

tO. 


void BuildProcessingInstmctionL( 
RString& aTatget, 
RStringSc aData) 


Build a processing instruction. 

aTarget is the processing instruction terget 

aData is the processing instruction data. 



RTaglnfo 

The following is die public API for this class: 
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j Method 


Description 


void Open( 
RString& aUri, 
RString& aPrefix, 
RString&: aLocalName) 


Sets the tag infotmation for an element or 
attribute. 

aUri is the URI of the namespace. 
aPrefix is the prefix of the qualified name. 
aLocalName is the local name of the qualified 
name. 


void CloseO 


Closes the tag information. 


RStting&: UriQ 


Returns the URI. 


RSt:ring&: LocalNameQ 


Returns the local name. 


RStiing& PrefixQ 


Returns the prefix. 


RNamespaceCoUection 

The follo^wng is the public API for this class: 


Method 


Description 


void ConnectO 


Every time this method is called a reference 
counter is incremented so that the namespace 
collection is only destroyed when no clients are 
using it. 


void CloseQ 


Every time this method is called a reference 
counter is decremented and the object is 
destroyed only when the reference counter is 
zero. 


const CMarkupNameSpacc& 

OpenNaniespaceL( 

const' TDesC8& aMimeType) 


Opens a namespace plug-in and returns a 
reference to (he namespace plug-in. If the 
namespace plug-in is not loaded it will be 
automatically loaded. 

aMimeType is die MIME type of the plug-in to 
open. 


const CMatkupNameSpace& 
OpenNatnespaceL( 
TUintS* aCodePage) 


Opens a namespace plug-in and returns a 
reference to die namespace plug-in. 
aCodePage is the code page of the plug-in to 
open. 
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void ResetO 


Resets the namespace collection and stting pool. 


RStringPool StfingPoolO 


Returns a handle to the string pool object. 


CMarkupNamespace 

The following is the API for this class: 


Method 


Description 


void NewL(RStringPool 
aStringPool) 


Creates the namespace plug-in. 

aStringPool is a handle of the string pool to add 

static string tables. 


RStringSc Element( 

TUintS aWbxmlToken) const 


Returns a handle to the string. 
aWbxmlToken is the WBXML token of the 
element. 


void AttributeValuePaif ( 
TUintS aWbxmlToken 
RStringSc aAttribute, 
RStringSc aValue) const 


Returns a handle to the attribute and value 
strings. 

aWbxmlToken is die WBXML token of die 
attribute. 

aAttribute is die handle to die attribute string. 
aValue is die handle to die value stting. 


RString& AttributeValue( 
TUintS aWbxnalToken) const 


Returns a handle to an attribute value. 
aWbxmlToken is the WBXML token of the 
attribute. 


RStringSc NamespaceUriQ const 


Returns the namespace name. 


TUintS CodePageO const 


Returns the code page for this namespace. 


RTableCodePage 

The following is die API for tiiis class: 


1 Method 


Description 


RString NameSpaceUriQ 


Returns die namespace URl tor this code page. 


Tint Sti:ingPoonndexFi:omToken( 
Tint aToken); 


Gets a StringPool index from a token value. -1 is 
returned if die item is not found. 


Tint TokenFromStrmgPooEndex( 
Tint aindex); 


Gets a token value from a StringPool index. -1 is 
returned if the item is not found. 
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CMatkupPluginBase 

The following is the API for this ECOM class: 



Method 


Description 


CMarkupPluginBaseSc RootPluginO 


Returns a reference to the root plug-in. This must 
be either a parser or generator plug-in. - 


CMarkupPluginBasefic 
ParentPluginO 


Returns a reference to the Parent plug-in; 


RElementStack& ElementStackQ 


Returns a handle to the element stack. 


RNameSpaceCoIlection& 
NamespaceCoUectionO 


Returns a handle to the namespace collection. 


CMatkupCharSetConverter& 
CharSetConverterO 


Returns a reference to the character set converter 
object. 


TBool IsChildElementValid( 
RStdngSc aPatentElement, 
RSttingSc aChildElement) 


Checks if die aChildElement is a valid child of 
aPatentElement. 


CMatkupPlugin 

The following is the API for this ECOM class: 


Method 


Description 


CMarkupPlugin* NewL( 
MMarkupCallback& aCallback) 


Creates an instance of a mark-up plug-in. 
aCallback is a reference to the call-back to 
report events. 


void SetParent( 

CMarkupPluginBase* 

aParentPlngin) 


Sets the parent plug-in for this plug-in. 
aParentPliigin is a pointer to the parent plug-in 
or NULL if there is no parent. A parser or 
generator does not have a parent so this must not 
be set, as the default NULL will indication there 
is not parent. 


CParserSession 

TTie following is the API for this ECOM class: 


Method 


Description 


CParserSession* NewL( 


Opens a parser session. 
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MDataSupplietReader& aReader, 

const TDesC8& 

aMarkupMimeType, 

const TDesC8& 

aDocumentMimeType, 

MMarkupCallback& aCallback, 

RNamespaceCoUection* . 

aNamespaceCoUection, 

CMarkupCharSetConverter& 

aCharSetConverter) 


aReader is the data supplier reader to use during 
parsing. 

aMarkupMimeType is die MIME type of the 
parser to open. 

aDocumentMimeType is the MIME type of 
the document to parse. 

aCallback is a reference to die caU-back so die 
parser can report events. 
aNamespaceCollection is a handle to a 
previous namespace collection. Set to NULL if a 
new RNamespaceCoUection is to be used. 
aCharSetConvettet is a reference to the 
character set conversion class. 


void StaxtQ 


Start parsing the document. 


void StopO 


Stop parsing die document. 


void Reset( 

MDataSupplierReader& aReader, 
MMarkupCallback& aCallback) 


Resets die parser ready to parse a new document. 
aReader is the data suppUcr reader to use during 
parsing. 

aCaUback is a reference to the caU-back so the 
parser can report events. 


void SetParseMode( 
Hat aParseMode) 


Selects one or more parse modes. 

See RParserSession for details on aParseMode. 


CGeneratorSession 

The following is the API for this ECOM class: 


Method 


Description 


void OpeiiL( 

MDataSupplierWriter& aWriter, 
TUid aMaikiipMimeType, 
const TDesC8& 
aDoctuneatMimeType, 
MMatkupCallbackSc aCallback, 
RNamespaceCoUection* 


Opens a generator session. 

aWtitet is the data supplier writer used to 

generate a document. 

aMarkupMimeType is the MIME type of die 
generator to open. 

aDocumentMimeType is the MIME type of 
the document to parse. 
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aNamespaceCoUection, 

CMatkupCharSetConverter& 

aCharSetConvertet) 


aCallback is a reference to die call-back so die 
generator can report events. 
aNamespaceCoUection is a handle to a 
previous namespace collection. Set to NUIX if a 
new RNamespaceCoUection is to be used. 
aCharSetConverter is a reference to the 
character set conversion class. 


void Reset( 

MDataSupplierWriter& aWritet, 
MMarkupCaIlback& aCallback) 


Resets the generator ready to generate a new 
document. 

aWtitet is die data supplier writer used to 
generate a document. 

aCallback is a reference to the call-back so die 
generator can report events- 


void BuildStaitDocumentL( 

RDocumentPatametcrs 

aDocParaxn); 


Builds the start of the document. 

aDocParam specifies die various parameters of 

the document. 


void BuildEndDocvimentLO 


Builds die end of the document 


void BuildStartEIementL( 
RTagInfo& aElement, 
RAttributeuArray& aAttributes) 


Builds the start element with attributes and 
namespace if specified. 

aElement is a handle to the element's details. 
aAttributes contains die attributes for the 
element 


void BuildEndElementL( 
RTagInfo& aElement) 


Builds the end of the element 

aElement is a handle to the element's details. 


void BuildContentL( 

const TDesC8& aContentPart) 


Builds part or all of die content Large content 
should be built in chunks. I.e. diis function 
should be called many times for each chunk. 
aBytes is die raw content data. This data must be 
converted to the correct character set by the 
client 


void BuildProcessingInstructioiiL( 
RStnng& aTarget, 
RStdngSc aData) 


Build a processing instrucdon. 

aTarget is die processing instrucdon target. 

aData is the processing instmcdon data. 
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RAttribute 



The following is the API for this class: 



Method 


Description | 


RTagInfo& AttributeQ 


Returns a handle to the attribute s name aetaiis. 


TAttiibuteType TypeO 


Returns the attribute's type. "Where 

TAttributeType is one of the following 

enumeration: 

CDATA 

ID 

IDREF 

IDREFS 

NMTOKEN 

NMTOKENS 

ENTITY 

ENTETIES 

NOTATION 


RString&ValueO 


Returns a handle to the attribute value. If the 
attribute value is a list of tokens (IDREFS, 
ENTITIES or NMTOKENS), the tokens -will be 
concatenated into a single RString witib each 
token separated by a single space. 



5 

MDataSupplierReader 

The following is the API for this mix-in class: 



Method 


Description 


TUintS GetByteLO 


Get a single byte from the data supplier. 


const TDesC8& GetBytesL( 
Tint aNximberOfBytes) 


Gets a descriptor of size aNumberOfChars, If the number 
of bytes is not available this mediod leaves with KErrEof. 
The returned descriptor must not be deleted until another 
call to GetBytesL or EndTransactionLQ is made. 


void StartTransactionLQ 


The parser calls this to indicate the start of a transaction. 
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void EndTrasactionLQ 


The parser calls this to indicate the transaction has ended. 
Any data stored for the transaction may now be deleted. 


void RoUbackLQ 


The parse calls this to indicate the transaction must be rolled 
b^ck to the exact state as when StartTransactionLQ was 
called. 


MDataSupplierWriter 

The following is the API for this mix-in class: 


1 Method 


Description 1 


void PutByteL( 
TUintS aByte) 


Put a byte in the data supplier. 


void PutBytesL( 

const TDesC8& aBytes) 


Puts a descriptor in die data supplier. 


MMatkupCallback 

The following is the API for this mix-in class: 


1 Method 


Description 


void OnStartDocumentL( 

RDocumentParametcfs 

aDocParam, 

Tint aEttorCode); 


Callback to mdicate the start ot the document. 
aDocParam specifies the various parameters of the 
document 

aErrorCode is the error code. If this is not BErtNone then 
special action may be required. 


void OnEndDocumentL( 
Tint aErrorCode); 


Indicates the end of the docviment has been reached . 
aErrorCode is the error code. If this is not KErrNone then 
special action may be required. 


void OnStartElementL( 
RTagInfo& aElement, 
RAttributeArray& aAttributes, 
Tint aErrotCode); 


Callback to indicate an element has been parsed, 
aElement is a handle to the element's details. 
aAttributes contains the attributes for the element. 
aErrorCode is the arror code. If this is not KErrNone then 
special action may be required. 


void OnEndElementL( 
RTaglnfoSc aElement, 
Tint aEttorCode); 


Callback to indicate the end of the element has been 
reached. 

aElement is a handle to the element's details. 
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aErrorCode is the error code. If this is not JUSrrNone dien 
special action may be required. 


void OnContentL( 
const TDesC8& aBytes, 
Tint aErrorCode) 


Sends the content of the element. Not all the content may 
be returned in one go. The data may be sent in chunks. 
When an OnEndElementL is received this means there is 
no more content to be sent. 

aBytes is the raw content data for the element. The client is 
responsible for converting the data to die required character 
set if necessary. In some instances with WBXML opaque 
data the content may be binary and must not be converted. 
aErrorCode is the error code. If this is not KErrNone then 
special action may be required. 


void OnStartPrefixMappingL( 
RString& aPrefix, 
RString& aUri, 
Unt aErrorCode) 


Notification of tiie beginning of the scope of a prefix-URI 
Namespace mapping. This method is always called before 
the corresponding OnStartElementL mediod. 
aPrefix is the Namespace prefix being declared. 
aUri is the Namespace URI the prefix is mapped to. 
aErrorCode is the error code. If tiiis is not KErrNone then 
special action may be reqviired. 


void OnEndPrefixMappingL( 
RString& aPrefix, 
Tint aErrorCode) 


Notification of the end of the scope of a prefix-URI 
mapping. This method is called after the corresponding 
DoEndElementL method. 

aPrefix is the Namespace prefix that was mapped. 
aErrorCode is the error code. If this is not KErrNone then 
special action may be required. 


void OnIgaoteableWhiteSpaceL( 
const TDesC8& aBytes, 
Unt aErrorCode) 


Notification of ignorable whitespace in element content. 
aBytes are the ignored bytes firom the document being 
parsed. 

aErrorCode is the error code. If diis is not KErrNone then 
speaal action may be required. 


void OnSkippedEntityL( 
RStringSc aName, 
Unt aErrorCode) 


Notification of a skipped entity. If the parser encounters an 
external entity it does not need to expand it — it can return 
the entity as aName for the client to deal with. 
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aName is the name of the skipped entity. 

aErtorCode is the error code. If this is not KErrNone then 

special action may be required. 


void OnProcessingInstructionL( 
const TDesC8& aTarget, 
const TDesC8& aData, 
Tint aErrorCode) 


Receive notification of a processing instruction. 

aTarget is the processing instruction target. . 

aData is^the processing instruction data. If empty none was 

supplied. 

aErrorCode is the error code. If this is not EErrNone then 
special action may be required. 


void OnOutOfDataLQ 


nPV* It! ryr\ •m/^-r** Ackta in t\^p (\<^t^ ^ntrnliet* tO OafSC. If thcrC 

is more data to parse StattQ should be called once there is 
more data in the supplier to continue parsing. 


j void OnErrorCnnt aEttor) 


An error has occurred where aError is the error code 



5 Sequence Diagrams 

Setting up, parsing and generatiiig 

Figure 6 shows the interaction of the client with die various parser objects to create a 
parser and generator session. The parsing of a simple document with only one element 
and generation of one element is shown. It is assumed a DTD validator and auto correct 
10 component are used. Auto correction in this example is only used with the parser. The 
generator only checks that tags are DTD compliant but does not try to correct any DTD 
errors. 

15 

Element not valid at current level in DTD 

Auto correction is left up to die plug-in implementers to decide how and what should be 
corrected. The sequence diagram in Figure 7 shows an example of what is possible with 
the case where the format of die document is valid, however, there is a invalid element 
20 (Q that should be at a different level as shown in an example doctament below: 
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<A>Content 
<B> 

<C> // Not valid for the DTD, should be a root element. 

Some content 
</C> 

</B> 

</A> 

// <C> should go here 



10 The bad element is detected by the DTD validator and sent to the auto correct 
component. The auto corrector realises that this element has an error from the error 
code passed in die call-back and tries to find out where die element should go, and send 
back the appropriate OnEndElementLQ call-backs to die dient. 

15 Scenarios 

Set-up a parser to parse WBXML without any plug-ins. 

Scenario to parse die following document: 



20 



<A> 

<B> 

Content 

</B> 

<A> 
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1. • The client creates a data supplier diat contains die data to be parsed. 

2. The client creates an RParserSession passing in die data supplier, MIME type for 
WBXML, the MIME type of the document to be parsed and the call-back pointer 

30 where parsing events are to be received. 

3. The client begins die parsing by calling StartQ .on die parser session. 

4. The parser makes the following call-backs to the client: 

OnStardDocumentLO 
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OnStartElementLCA') 
OnStartElementL(T3') 
6nContent('Content^ 
OnEndElementL(T3') 
OnEndElementLCAO 
OnEndDocumentLO 

Set-up a parser to parse WBXML with a validator plug-in 
The same document as 5.1 is used in this scenario. 

1 . The client creates a data supplier that contains the data to be parsed. 

2. The client constructs a RMarkupPlugins object with the UID of a validator. 

3. The client creates an RParserSession passing in the data supplier, MIME type for 
WBXML, the MIME type of the document to be parsed, call-back pointer where 
parsing events are to be received and die array of plug-ins object. 

4. The parser session first iterates through the array of plug-ins starting from the end of 
the list. It creates the CValidator ECOM object setting the caU back to the client. 
The CWbxmlParser ECOM object is created next and its call-back is set to die 
CValidator object. This sets up the chain of call-back events from the parser 
through to the validator and then the client. The validator needs access to data from 
the parser so SetParent needs to be called on all the plug-ins in the array. The 
validator sets its parent to the parser object. 

5. The client begins the parsing by calling StartQ on the parser session. 

6. The parser makes the following call-backs to the client 

OnStartDocumentLO 

OnStariElementL(*A') 

OnStartElementL(*B') 

OnContentCContent^ 

OnEndElementL(T30 

OnEndElementLCAO 

OnEndDocumentLO 
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Generating a WBXML document with a DTD validator 
The document in 5.1 is to be generated in tiiis scenario. 

1 . The client creates a data supplier with an empty buffer. 

2. The client constructs a RMarkupPlugins object with the UID of a validator. 

3. The client creates a RParserGenerator passing in the data supplier, MIME type for 
WBXML, MIME type of the document to be parsed and the array of plug-ins object. 

4. The generator session first iterates through die array of plug-ins starting from the end 
of the list. It creates the CValidator ECOM object setting the call back to the client. 
The CWbxmlGenerator ECOM object is created next and its call-back is set to the 
CValidator object. This sets up the chain of call-back events from the generator 
through to the validator and then the client. The validator needs access to data from 
the parser so SetParent needs to be called on all the plug-ins in the array. The 
validator sets its parent to the parser object. 

5. The client then calls the following methods: * 

BuildStartDocumentLQ 
BuildStardBiementL(*A') 

BuildStartElementL('B') 

BuildContentLCContent^ 

BuildEndElementL(T3') 

BuildEndElementLCA') 
Design Considerations 

• ROM/RAM Memory Strategy — the string pool is used to minimise duplicate strings. 
Error condition handling — errors are returned back to plug-icis and the client via the 
call-back API. 

• Localisation issues — documents can use any character set and the character set is 
returned back to the client in die case of parsing so it knows how to deal witii die 
data. For a generator the client can set die character set of the document. 

• Performance considerations — the string pool makes string comparisons efficient. 

• Platform Security — in normal usage the parser and generator do not need any 
capabilities. However, if a plug-in were designed to load a DTD from the Internet it 
would require PhoneNetwork capabilities. 
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• Modularity - all 'components in the framework are ECOM components that can be 

replaced or added to in die future. 
Testing 

The data supplier and parser generator set-up components can be tested individually - all 
the Sanctions are synchronous and therefore no active objects need to be created for 
testing. 

The following steps can be carried out to test parsing and generation of WBXML or 
XML: 

1. Load a pre-created file. 

2. Parse die file. 

3. Generate a buffer from the output of die parser. 

4. Compare the output of the buffer widi the original pre-created file to see if they 
match. 

Additional tests are carried out to test error conditions of parsing, such as badly 
formatted documents and corrupt documents. 
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Glossary 

The following technical terms and abbreviations are used witliin this document. 







XML 


Extensible Markup Language 


WBXML 


WAP Binary Extensible Markup Language 


SAX 


Simple API for XML 


DOM 


Document Object Model 


Element 


This is a tag enclosed by angle brackets. E.g <Name>, <Address>, 
<Phone> etc 


Attributes 


inese are tne attnoutes associatea witn an eiemeni:. xifg. ^^jrnonc 
Type='*Mobile"> The attribute here is 'Type". 


V axlak^d 


These are the actual value of an attribute. E.e. <Phone 
Type=''Mobile"> The value here is "Mobile" 




This is the actual content for an element. E.g. 
<Name>Symbian</Name>. Here "Symbian" is the content for the 
element *TSIame". 


DTD 


Document Type Definition 


MIME 


Multipurpose Internet Mail Extensions 


Code Page 


Since only 32 elements can be defined In WBXML, code pages are 
created so tiiat each code page can have 32 elements. 


XSLT 


Extensible Style-sheet Language Transformations 


SOAP 


Simple Object Access Protocol 


URI 


Uniform Resource Identifiers 


qualified name 


A qualified name specifies a prefix : local name e.g, *HTML:B' 


ptefix 


From die qualified name example tiiis is 'HTML* 


local name 


From the qualified name example tiiis is TB* 
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Appendix A - <Auto correction examples> 

Table Al shows a situation where the end tags are the wrong way round for A and B, 
This is very easy to fix since the DTD validator keeps a stack of the tags, it knows what 
die end tag should be. 



10 ' 



<A>Content 
<B> 

More content 
</A> 

</B> 



Table A1: End tags that are the rnvng wof round 



15 Table A2 shows die situation where the B end tag is missing. Since the end tag does not 
match a guess can be made tiiat there should be an end tag for B before the end tag of A. 



<A>Content ' 
20 • <B> 

More content 

</A> 



Tabk A2: Missing end tag 



Table A3 shows the situation where there are no end tags for A and B. The DTD 
validator will detect the problem and seiid an end tag for B to the client. The auto correct 
component wiU query the DTD validator if the C tag is valid for the parent element A. If 
it is valid a OnStartELementLQ will be sent to the client, otherwise the auto correct 
30 component can check further up the element stack to find where this element is valid. If 
it is not valid anywhere in the stack tiien it will be ignored togetiier with any content and 
end element tag. 
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<A>Content 
<B> 

More content 
<C> 

Some content 
</C> 



Table A3: Missing end tags 
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Appendix B - How to write a namespace plug-in 

The tables below show the WBXML tokens for the example namespace. Tables 1 to 3 
each represent a static string table. Tables 1 shows the elements for code page 0. Tables 2 
and 3 are for attribute value pairs respectively. Each attribute index on Table 2 refers to 
the values of the same index in Table 3. These token values must match up in Tables 2 
and 3. If an attribute does not have a value then there must be a blank as shown in Table 
3 with token 8. For attribute values, these also appear in Table 3 but have a WBXML 
token value of 128 or greater. 



Element type name 


WBXML 
token 


Addr 


5 


AddType 


6 


Auth 


7 


AudiLevel 


8 


Table 1: BkmenfTabhO, cpdi 


> ptfge 0 


Attribute name/value pair 
(attribtite part) 


WBXMT, 
token 


TYPE 


6 


TYPE 


7 


NAME 


8 


NAME 


9 



Table 2: AttributeValuePaitNameTable, code page 0 



Attribute name/value pair 


WBXML 


(value part) 


token 


ADDRESS 


6 


URL 


7 




8 


BEARER 


9 


GSM/CSD 


128 



5 
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GSM/SMS 


129 


GSM/USSD 


130 



Table 3: AttributeValuePairValueTable, code page 0 



The following string table files (-st) are created for each table: 
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# Element table for code page 0 
stringtable ElementCodePageO 
EAddrAddr 
EAddType AddType 
EAuth Auth 
EAuthLevel AuthLevel 



String table for Table 1 
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# Attributes table for code page 0 
stringtable AttributesCodePageO 
EType Type 
EType Type 
EName Name 
EName Name 



20 String table for Table 2 
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# Attribute values table for code page 0 
stringtable AttributeValuesCodePageO 
EAddress Address 
EURLURL 
EBeater BEARER * 
EGSM_CSD GSM/CSD 
EGSM^SMS GSM/SMS 
EGSM_USSD GSM/USSD 



30 



String Table for Table 3 



wo 2005/036390 PCT/GB2004/004276 

42 

<Example usage of API > 

Below shows an example of how to setting up the patser and generator with DTD 
checking and auto correction. 

5 RMarkupPlugins plugins; 

plugins Append(KMyValidator) ; 
plugins Append(KMyAutoCorrector} ; 

CDesctiptotDataSupplier* dataSupplier = CDescriptorDataSupplier::NewLCO; 
RParserSession parser; 

10 parser.OpenL(dataSupplier, MarkupMimeType, DocumentMimeType, callback, plugins); 
parser.ParseQ; 

/ / Callback events will be received 
parser.CloseO; 

15 // Now construct a generator using the same plug-ins and data supplier 
RGeneratorSession generator; 

generator.OpenL(dataSuppUer, MarkupMimeType, DocumentMimeType, caUback, 
plugins); 

generator.BviildStartDocumentiLO; 
20 RAttributeArray attributes; 

/ / Get an RString from the ElementStringTable 

RString string=generator.StringPoolO.Strmg(ElementStringTable::Tagl , 
ElementStringTable); 
/ / Build one element witii content 
25 generator.BuildStartElementL(string, attributes); 

generator,BuildContentLCJL8(*This is the content"")); 
generator.BuildEndElementL(string); 
generator.BuildEndDocumentLQ; 
generator.CloseO; 
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Appendix C 

How the String Pool is used to parse both text and binaty mark-up language 

5 The Mark-up Language firamework design relies on the fact that it is possible (using the 
'String Poor techniques described below, although other mapping techniques can also be 
used) to provide the same interface to clients no matter if text or binary mark-up 
language is used. 

10 Text based mark up languages use strings, i.e. sequences of characters or binary data. In 
the String Pool technique, static tables of these strings are created at compile time, with 
one string table per namespace, for all the elements, attributes and attribute values 
needed to describe a particular type of mark-up document. Each element, attribute and 
attribute value is assigned an integer number and these integer ^handles' form an index of 

15 the strings. A string in an XML document can be rapidly compared to all strings in the 
string table by the efficient process of comparing the integer representation of the string 
with all of the integer handles in die static string table. The main benefit of using a string 
pool for parsing is dierefore tiaat it makes it very easy and efficient for the client to check 
for what is being parsed, since handles to strings are used instead of actual strings. This 

20 means only integers are compared rathet than many characters, as would be the normal 
case if string pools were not used. Also, comparisons can be carried out in a simple 
switch statement in the code, making die code efficient, and easier to read and maintain. 
Hence, the string pool is used to make string comparisons efficient at the expense of 
creation of the strings. 

25 

For binary mark-up language (e.g. WBXML) the situation is more complex since there 
are no strings in WBXML. In WBXML, everything is tokenised (i.e. given a token 
number). We get aroimd the absence of strings as follows: a table of mappings of each 
of the WBXML tokens to the index of the string in the string table is created (see Figure 
30 8). Each mapping is given a unique integer value — a handle. Since it is required to map 
firom tokens to strings and vice versa, two lists of integer value handles are created: one 
indexed on tokens and the otiaer indexed on the index of the position in the string table. 
This is so that it is quick to map firom one type to the other. All tiiis is encapsulated in 
die namespace plug-in and therefore is insulated firom titie client, parser and generator. 
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The client can tiierefore parse a binary or text document without having to know about 
the specific format - it simply uses the integer handle (RString), which will, work 
correcdy for bodi text and binary mark-up languages. 



